Huddersfield
Essay cheating at universities an 'open secret'
A BBC investigation has uncovered claims that essay cheating remains widespread at UK universities despite the introduction of a law designed to stop it. Since April 2022, it has been illegal to provide essays for students in post-16 education in England. But so far there have been no prosecutions. The BBC has spoken to a former lecturer who describes essay cheating as an open secret and to a businessman who claims to have made millions from selling model answer essays to university students. Universities UK, which represents 141 institutions, said there were severe penalties for students caught submitting work that was not their own.
- North America > United States (0.15)
- North America > Central America (0.14)
- Europe > United Kingdom > England > West Yorkshire > Huddersfield (0.05)
- (17 more...)
Perceptually Aligning Representations of Music via Noise-Augmented Autoencoders
Bjare, Mathias Rose, Cantisani, Giorgia, Pasini, Marco, Lattner, Stefan, Widmer, Gerhard
We argue that training autoencoders to reconstruct inputs from noised versions of their encodings, when combined with perceptual losses, yields encodings that are structured according to a perceptual hierarchy. We demonstrate the emergence of this hierarchical structure by showing that, after training an audio autoencoder in this manner, perceptually salient information is captured in coarser representation structures than with conventional training. Furthermore, we show that such perceptual hierarchies improve latent diffusion decoding in the context of estimating surprisal in music pitches and predicting EEG-brain responses to music listening. Pretrained weights are available on github.com/CPJKU/pa-audioic.
- Europe > United Kingdom > England > Greater London > London (0.04)
- Europe > United Kingdom > England > West Yorkshire > Huddersfield (0.04)
- Media > Music (1.00)
- Leisure & Entertainment (1.00)
Dataset Creation and Baseline Models for Sexism Detection in Hausa
Muhammad, Fatima Adam, Hassan, Shamsuddeen Muhammad, Inuwa-Dutse, Isa
Sexism reinforces gender inequality and social exclusion by perpetuating stereotypes, bias, and discriminatory norms. Noting how online platforms enable various forms of sexism to thrive, there is a growing need for effective sexism detection and mitigation strategies. While computational approaches to sexism detection are widespread in high-resource languages, progress remains limited in low-resource languages where limited linguistic resources and cultural differences affect how sexism is expressed and perceived. This study introduces the first Hausa sexism detection dataset, developed through community engagement, qualitative coding, and data augmentation. For cultural nuances and linguistic representation, we conducted a two-stage user study (n=66) involving native speakers to explore how sexism is defined and articulated in everyday discourse. We further experiment with both traditional machine learning classifiers and pre-trained multilingual language models and evaluating the effectiveness few-shot learning in detecting sexism in Hausa. Our findings highlight challenges in capturing cultural nuance, particularly with clarification-seeking and idiomatic expressions, and reveal a tendency for many false positives in such cases.
- Africa > Nigeria > Jigawa State > Dutse (0.05)
- North America > United States > Virginia (0.04)
- Europe > United Kingdom > England > West Yorkshire > Huddersfield (0.04)
- Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)
- Research Report > New Finding (0.49)
- Public Relations > Community Relations (0.34)
Explore to Evolve: Scaling Evolved Aggregation Logic via Proactive Online Exploration for Deep Research Agents
Wang, Rui, Zhang, Ce, Ma, Jun-Yu, Zhang, Jianshu, Wang, Hongru, Chen, Yi, Xue, Boyang, Fang, Tianqing, Zhang, Zhisong, Zhang, Hongming, Mi, Haitao, Yu, Dong, Wong, Kam-Fai
Deep research web agents not only retrieve information from diverse sources such as web environments, files, and multimodal inputs, but more importantly, they need to rigorously analyze and aggregate knowledge for insightful research. However, existing open-source deep research agents predominantly focus on enhancing information-seeking capabilities of web agents to locate specific information, while overlooking the essential need for information aggregation, which would limit their ability to support in-depth research. We propose an Explore to Evolve paradigm to scalably construct verifiable training data for web agents. Begins with proactive online exploration, an agent sources grounded information by exploring the real web. Using the collected evidence, the agent then self-evolves an aggregation program by selecting, composing, and refining operations from 12 high-level logical types to synthesize a verifiable QA pair. This evolution from high-level guidance to concrete operations allowed us to scalably produce WebAggregatorQA, a dataset of 10K samples across 50K websites and 11 domains. Based on an open-source agent framework, SmolAgents, we collect supervised fine-tuning trajectories to develop a series of foundation models, WebAggregator. WebAggregator-8B matches the performance of GPT-4.1, while the 32B variant surpasses GPT-4.1 by more than 10% on GAIA-text and closely approaches Claude-3.7-sonnet. Moreover, given the limited availability of benchmarks that evaluate web agents' information aggregation abilities, we construct a human-annotated evaluation split of WebAggregatorQA as a challenging test set. On this benchmark, Claude-3.7-sonnet only achieves 28%, and GPT-4.1 scores 25.8%. Even when agents manage to retrieve all references, they still struggle on WebAggregatorQA, highlighting the need to strengthen the information aggregation capabilities of web agent foundations.
- North America > United States > New York (0.04)
- Asia > Thailand > Bangkok > Bangkok (0.04)
- North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)
- (5 more...)
- Leisure & Entertainment > Sports (1.00)
- Media (0.93)
The Rise of AfricaNLP: Contributions, Contributors, and Community Impact (2005-2025)
Belay, Tadesse Destaw, Hussen, Kedir Yassin, Imam, Sukairaj Hafiz, Ahmad, Ibrahim Said, Inuwa-Dutse, Isa, Haile, Abrham Belete, Sidorov, Grigori, Ameer, Iqra, Abdulmumin, Idris, Gwadabe, Tajuddeen, Marivate, Vukosi, Yimam, Seid Muhie, Muhammad, Shamsuddeen Hassan
Natural Language Processing (NLP) is undergoing constant transformation, as Large Language Models (LLMs) are driving daily breakthroughs in research and practice. In this regard, tracking the progress of NLP research and automatically analyzing the contributions of research papers provides key insights into the nature of the field and the researchers. This study explores the progress of African NLP (AfricaNLP) by asking (and answering) basic research questions such as: i) How has the nature of NLP evolved over the last two decades?, ii) What are the contributions of AfricaNLP papers?, and iii) Which individuals and organizations (authors, affiliated institutions, and funding bodies) have been involved in the development of AfricaNLP? We quantitatively examine the contributions of AfricaNLP research using 1.9K NLP paper abstracts, 4.9K author contributors, and 7.8K human-annotated contribution sentences (AfricaNLPContributions) along with benchmark results. Our dataset and continuously existing NLP progress tracking website provide a powerful lens for tracing AfricaNLP research trends and hold potential for generating data-driven literature surveys.
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.15)
- Europe > Austria > Vienna (0.14)
- North America > United States > California (0.14)
- (63 more...)
- Overview (1.00)
- Research Report > New Finding (0.88)
- Government > Regional Government (1.00)
- Education (0.68)
- Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
OpenAI's GPT-OSS-20B Model and Safety Alignment Issues in a Low-Resource Language
In response to the recent safety probing for OpenAI's GPT-OSS-20b model, we present a summary of a set of vulnerabilities uncovered in the model, focusing on its performance and safety alignment in a low-resource language setting. The core motivation for our work is to question the model's reliability for users from underrepresented communities. Using Hausa, a major African language, we uncover biases, inaccuracies, and cultural insensitivities in the model's behaviour. With a minimal prompting, our red-teaming efforts reveal that the model can be induced to generate harmful, culturally insensitive, and factually inaccurate content in the language. As a form of reward hacking, we note how the model's safety protocols appear to relax when prompted with polite or grateful language, leading to outputs that could facilitate misinformation and amplify hate speech. For instance, the model operates on the false assumption that common insecticide locally known as Fiya-Fiya (Cyphermethrin) and rodenticide like Shinkafar Bera (a form of Aluminium Phosphide) are safe for human consumption. To contextualise the severity of this error and popularity of the substances, we conducted a survey (n=61) in which 98% of participants identified them as toxic. Additional failures include an inability to distinguish between raw and processed foods and the incorporation of demeaning cultural proverbs to build inaccurate arguments. We surmise that these issues manifest through a form of linguistic reward hacking, where the model prioritises fluent, plausible-sounding output in the target language over safety and truthfulness. We attribute the uncovered flaws primarily to insufficient safety tuning in low-resource linguistic contexts. By concentrating on a low-resource setting, our approach highlights a significant gap in current red-teaming effort and offer some recommendations.
- North America > United States (0.29)
- Africa > Nigeria > Jigawa State > Dutse (0.05)
- Europe > United Kingdom > England > West Yorkshire > Huddersfield (0.04)
- (2 more...)
Representation Learning on Large Non-Bipartite Transaction Networks using GraphSAGE
Tare, Mihir, Rattasits, Clemens, Wu, Yiming, Wielewski, Euan
Abstract--Financial institutions increasingly require scalable tools to analyse complex transactional networks, yet traditional graph embedding methods struggle with dynamic, real-world banking data. This paper demonstrates the practical application of GraphSAGE, an inductive Graph Neural Network framework, to non-bipartite heterogeneous transaction networks within a banking context. Unlike transductive approaches, GraphSAGE scales well to large networks and can generalise to unseen nodes which is critical for institutions working with temporally evolving transactional data. We construct a transaction network using anonymised customer and merchant transactions and train a GraphSAGE model to generate node embeddings. Our exploratory work on the embeddings reveals interpretable clusters aligned with geographic and demographic attributes. Additionally, we illustrate their utility in downstream classification tasks by applying them to a money mule detection model where using these embeddings improves the prioritisation of high-risk accounts. Beyond fraud detection, our work highlights the adaptability of this framework to banking-scale networks, emphasising its inductive capability, scalability, and interpretabil-ity. This study provides a blueprint for financial organisations to harness graph machine learning for actionable insights in transactional ecosystems.
- Europe > United Kingdom > England > Greater London > London (0.05)
- North America > United States (0.04)
- Europe > United Kingdom > Northern Ireland (0.04)
- Europe > United Kingdom > England > West Yorkshire > Huddersfield (0.04)
- Banking & Finance (1.00)
- Law Enforcement & Public Safety > Fraud (0.68)
Unsupervised Multi-Attention Meta Transformer for Rotating Machinery Fault Diagnosis
Wang, Hanyang, Yang, Yuxuan, Wang, Hongjun, Wang, Lihui
The intelligent fault diagnosis of rotating mechanical equipment usually requires a large amount of labeled sample data. However, in practical industrial applications, acquiring enough data is both challenging and expensive in terms of time and cost. Moreover, different types of rotating mechanical equipment with different unique mechanical properties, require separate training of diagnostic models for each case. To address the challenges of limited fault samples and the lack of generalizability in prediction models for practical engineering applications, we propose a Multi-Attention Meta Transformer method for few-shot unsupervised rotating machinery fault diagnosis (MMT-FD). This framework extracts potential fault representations from unlabeled data and demonstrates strong generalization capabilities, making it suitable for diagnosing faults across various types of mechanical equipment. The MMT-FD framework integrates a time-frequency domain encoder and a meta-learning generalization model. The time-frequency domain encoder predicts status representations generated through random augmentations in the time-frequency domain. These enhanced data are then fed into a meta-learning network for classification and generalization training, followed by fine-tuning using a limited amount of labeled data. The model is iteratively optimized using a small number of contrastive learning iterations, resulting in high efficiency. To validate the framework, we conducted experiments on a bearing fault dataset and rotor test bench data. The results demonstrate that the MMT-FD model achieves 99\% fault diagnosis accuracy with only 1\% of labeled sample data, exhibiting robust generalization capabilities.
- North America > United States (0.14)
- Asia > China > Beijing > Beijing (0.05)
- Europe > United Kingdom > England > West Yorkshire > Huddersfield (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.94)
- Europe > France (0.05)
- North America > United States > California > Alameda County > Berkeley (0.04)
- Europe > United Kingdom > England > West Yorkshire > Huddersfield (0.04)
- (4 more...)
From Classical Probabilistic Latent Variable Models to Modern Generative AI: A Unified Perspective
From large language models to multi-modal agents, Generative Artificial Intelligence (AI) now underpins state-of-the-art systems. Despite their varied architectures, many share a common foundation in probabilistic latent variable models (PLVMs), where hidden variables explain observed data for density estimation, latent reasoning, and structured inference. This paper presents a unified perspective by framing both classical and modern generative methods within the PLVM paradigm. We trace the progression from classical flat models such as probabilistic PCA, Gaussian mixture models, latent class analysis, item response theory, and latent Dirichlet allocation, through their sequential extensions including Hidden Markov Models, Gaussian HMMs, and Linear Dynamical Systems, to contemporary deep architectures: Variational Autoencoders as Deep PLVMs, Normalizing Flows as Tractable PLVMs, Diffusion Models as Sequential PLVMs, Autoregressive Models as Explicit Generative Models, and Generative Adversarial Networks as Implicit PLVMs. Viewing these architectures under a common probabilistic taxonomy reveals shared principles, distinct inference strategies, and the representational trade-offs that shape their strengths. We offer a conceptual roadmap that consolidates generative AI's theoretical foundations, clarifies methodological lineages, and guides future innovation by grounding emerging architectures in their probabilistic heritage.
- Asia > Middle East > Jordan (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Europe > United Kingdom > England > West Yorkshire > Huddersfield (0.04)
- Europe > Monaco (0.04)
- Research Report (0.50)
- Workflow (0.46)
- Overview (0.46)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)
- (2 more...)